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This study examined the efficacy of a curriculum-based intervention for 
high school science students. Specifically, the intervention was two years of 
research-based, multidisciplinary curriculum materials for science sup¬ 
ported by comprehensive professional development for teachers that focused 
on those materials. A modest positive effect was detected when comparing 
outcomes from this intervention to those of business-as-usual materials 
and professional development. However, this effect was typical for interven¬ 
tions at this grade span that are tested using a state achievement test. Tests of 
mediation suggest a large treatment effect on teachers and in turn a strong 
effect of teacher practice on student achievement—reinforcing the hypothe¬ 
sized key role of teacher practice. Tests of moderation indicate no significant 
treatment by demographic interactions. 
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The Current State of Curriculum Efficacy Research 

Science education is at a critical juncture where evidence of the effects 
of curriculum materials is greatly needed. It is not enough to develop curric¬ 
ulum materials and professional development (PD) programs whose 
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component features are based on extant research and hope they are effec¬ 
tive. Many stakeholders in education, practitioners in particular, need evi¬ 
dence from rigorous trials about which comprehensive programs (i.e., 
year-long programs using multiple, integrated features) have the greatest 
effects on student outcomes. Although numerous federal agencies have 
funded the development of curriculum materials over the past 30 years, 
the field of science education still lacks evidence regarding what programs 
(or types of programs) have noteworthy effects. As a consequence, school 
and district decision makers have had little guidance toward implementing 
potentially more efficacious programs that might displace those that have 
smaller or no effects (Slavin, 2008) and no way to appropriately respond 
to the need for improved STEM education. 

The Institute for Education Sciences (IES) established the What Works 
Clearinghouse (WWC) to provide stakeholders in education with informa¬ 
tion on programs that have undergone rigorous efficacy trials. The WWC 
provides stakeholders such as school districts information with which to 
make evidence-based decisions related to instructional interventions. At 
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the time this article was written, just eight studies of science interventions 
had met the evidence standards of the WWC (IES, 2014), and only one is 
related to curriculum materials or PD for use at the high school level. 

This lack of evidence is particularly troubling as we find ourselves at 
a time of rapid reform. As the new Framework for K-12 Science 
Education (National Research Council [NRC], 2012) and the Next 
Generation Science Standards (.NGSS-, NGSS Lead States, 2013) have been 
released, teachers across the country will be expected to teach new disci¬ 
plinary core ideas, crosscutting concepts, and scientific practices. While 
national documents clearly describe the three dimensions of NGSS well, min¬ 
imal empirical evidence accompanies these documents about the role of 
research-based curriculum materials in supporting student attainment of 
standards as well as the nature of the PD programs most likely to help teach¬ 
ers implement effective instruction. It is therefore imperative that curriculum 
and PD programs that attempt to, or claim to, support such goals are sub¬ 
jected to rigorous trials that can make confident causal claims about their 
impacts. 


Study Overview 

This study sought to test the causal link between a curriculum-based sci¬ 
ence education intervention and increased student achievement. The pri¬ 
mary goals of the research were to (a) test the overall efficacy of research- 
based curriculum materials with associated PD for improving high school 
science achievement, (b) explore the role of teacher practice in the relation¬ 
ship between use of the curriculum materials and improved student achieve¬ 
ment outcomes, and (c) explore the extent to which treatment effects were 
equitable across demographic groups. The curriculum materials under study 
are titled BSCS Science: An Inquiry Approach (hereafter referred to as An 
Inquiry Approach ). The materials were developed with funding from the 
National Science Foundation (ESI 9911614 and ESI 0242596) 

Theoretical Underpinnings of Research-Based 
Curriculum Materials 

In this article, we define curriculum materials to include both the student 
text as well as teacher support materials. This section addresses the theory 
behind both the student and teacher curriculum materials. 

The Role of Curriculum Materials in Science Classrooms 

Curriculum materials can be a means to improve student interest and 
achievement in science (NRC, 2007). The notion that curriculum materials 
truly matter and directly influence the learning process has been supported 
in the literature for decades (e.g., Forbes & Davis, 2010; Schmidt, McKnight, 
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& Raizen, 1997; Usiskin, 1985). Curriculum materials play a defining role in 
classrooms, affecting both what and how teachers teach (NRC, 2002). Ball 
and Cohen (1996) explain this powerful influence: 

Unlike frameworks, objectives, assessments, and other mechanisms 
that seek to guide curriculum, instructional materials are concrete 
and daily. They are the stuff of lessons and units, of what teachers 
and students do. . . . Not only are curriculum materials well- 
positioned to influence individual teachers’ work but, unlike many 
other innovations, textbooks are already “scaled up” and part of 
the routine of schools. They have “reach” in the system, (p. 6) 

Further, Schmidt, Houang, and Cogan (2002) caution against efforts to 
improve instruction that are isolated from efforts to improve curriculum 
materials available to teachers and students. “If we pretend that the textbook 
doesn’t exist—and conduct PD in ways that assume teachers will implement 
an entirely different approach to content than the texts take—believe me, the 
textbook will win” (p. 18). 

Constructivism and Student Learning 

Constructivism is a key foundation that frames research-based curricu¬ 
lum materials designed to emphasize opportunities for students to develop 
conceptual understandings of science. Our work is based on two common 
theoretical bases for constructivist research: Ausubelian theory (Ausubel, 
Novak, & Hanesian, 1978) and the work of L. S. Vygotsky (1978). 
Ausubelian theory states that a learner’s prior knowledge is an important fac¬ 
tor in determining what is learned in a given situation. Vygotsky’s work 
emphasizes the relationship between the teacher’s prior knowledge and 
the students’ prior knowledge as well as the importance of the social con¬ 
struction of knowledge. Students and teachers may use similar words to 
describe concepts yet have very different personal interpretations of those 
concepts. Vygotsky’s work implies that science curriculum and instruction 
should take into account the differences between teacher and student con¬ 
ceptions and should provide time for student-to-student interaction so that 
learners can develop concepts from those whose understandings and inter¬ 
pretations are closer to their own. 

Discussions of constructivist teaching and learning have been hampered 
by inconsistency in how it is envisioned by different scholars and research¬ 
ers, including being equated with completely unguided or “discovery” learn¬ 
ing. Hmelo-Silver, Duncan, and Chinn (2007) discuss the range of perspec¬ 
tives and offer this definition of a constructivist learning environment: an 
environment in which “students are cognitively engaged in sensemaking, 
developing evidence-based explanations, and communicating their ideas. 
The teacher plays a key role in facilitating the learning process and may pro¬ 
vide content knowledge on a just-in-time basis” (p. 100). It is this interplay 
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between the student as sensemaker and the teacher as facilitator that defines 
our view of constructivist learning environments. 

Today we see the roots of constructivism reflected in comprehensive 
reviews of the literature on learning, such as How People Learn (NRC, 

2000) . The authors summarize three key ideas about learning, suggesting 
that students come to the classroom with preconceptions that shape their 
learning, student competence requires a deep foundation of knowledge as 
well as an understanding of how this knowledge relates to a framework, 
and students benefit from explicitly monitoring and taking control of their 
own learning. The Inquiry Approach curriculum materials incorporated 
into the study intervention were strongly influenced by these findings. 

Coherence, Focus, and Rigor 

Though more than a decade old, the findings from the Trends in 
Mathematics and Science Study (TIMSS) analysis (Schmidt et al., 2001) and 
the research synthesis How People Learn (NRC, 2000) provide clear and com¬ 
pelling guidance for the development of effective curriculum materials. In 
general, these reports indicate that curriculum materials in the United 
States need to be more focused by having a storyline organized around 
key concepts, more coherent by having explicit connections between ideas, 
and more rigorous by setting high standards for learners with respect to both 
their cognitive and metacognitive development. Curriculum materials devel¬ 
oped within a framework that is coherent both within years and across years 
facilitates a deeper student conceptual understanding (American Association 
for the Advancement of Science [AAAS], 2001; Carlson, Davis, & Buxton, 
2014; NRC, 1999, 2007). Yet, persistent evidence indicates that curriculum 
materials in science are fragmented, lacking coherence, and not well articu¬ 
lated through a sequence of grade levels (AAAS, 2001; Kesidou & Roseman, 
2002; Schmidt et al., 1997, 2001; Schmidt, Wang, & McNight, 2005). As 
a result, curriculum materials in the United States generally cover many con¬ 
cepts, often repeating concepts annually without depth (Schmidt et al., 1997, 

2001) . Most materials focus on details that are tangential to the key ideas and 
fail to make connections across units when the same key idea is presented in 
different ways (Kesidou & Roseman, 2002). 

Educative Materials and Teacher Support in the Classroom 

Research-based curriculum materials for students will never eliminate 
the important role of the teacher in the classroom. Teachers ultimately shape 
how curriculum materials are enacted in the classroom (Beyer & Davis, 2012; 
Forbes & Davis, 2010). Teachers select elements of text to include for instruc¬ 
tion, and they emphasize or deemphasize aspects of a curriculum based on 
their own understanding and beliefs about what is best for students. 
Remillard (2005) described the complex teacher-curriculum relationship as 
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contextually based, dependent on both the teacher and the curriculum, and 
tightly interconnected with other teaching practices. If a teacher’s under¬ 
standings and beliefs about instruction align with the philosophy of the cur¬ 
riculum, then it is likely that there will be a synergistic relationship between 
use of the materials and practice (Powell & Anderson, 2002). On the other 
hand, a teacher may understand instruction and hold beliefs about practices 
that diverge from the philosophy of the materials, creating a gap between 
what curriculum developers intended and what the teacher actually enacts 
in the classroom (Ball & Cohen, 1996). 

Because of the ubiquitous placement of curriculum materials in the school 
setting, there is unique potential for curriculum materials to support teachers 
as learners. Teachers may use their curriculum materials to deepen their con¬ 
tent knowledge, gain ideas for how to present complex information to stu¬ 
dents, or determine how they might assess student learning. Some researchers 
describe curriculum materials that explicitly address the teacher as learner as 
“educative” (Beyer, Delgado, Davis, & Krajcik, 2009; Davis & Krajcik, 2005). 
Davis and Krajcik (2005) identified nine heuristics to describe educative sci¬ 
ence materials and how science curriculum materials can support teachers’ 
enactment of reform-based instruction with their students. The heuristics focus 
on teacher subject matter knowledge as well as pedagogical content knowl¬ 
edge (PCK). In addition, they articulate the importance of including a rationale 
for curricular design decisions and providing supports for teachers to adapt 
materials. Specifically, embedded educative teacher resources can include 
information and rationale on the instructional model, additional scientific 
background, alternative understandings students may have associated with 
the content, as well as suggestions for enhancing students’ abilities to function 
as a group. As such, educative science curricula become a resource for teach¬ 
ers, supporting them as they use materials in their own instructional settings. 

Schneider and Krajcik (2002) observed that teachers who use educative 
materials enhance their content learning and better implement specific strat¬ 
egies and representations suggested in the materials. Similarly, Davis and 
Krajcik (2005) noted a link between teachers’ use of educative curriculum 
materials for science and their PCK for corresponding science topics. 


Theoretical Underpinnings of Curriculum-Based Professional Development 

Although educative curriculum materials have clear advantages, they are 
often complex, and teachers benefit from additional support to fully under¬ 
stand the curricula they are trying to implement (Davis & Krajcik, 2005). For 
example, teachers often do not fully avail themselves of the information pro¬ 
vided in the teacher’s materials. In addition, even when teachers do study 
the educative materials, they will likely interpret the information using their 
own experiential lenses (McNeill, 2009). Thus, the evidence suggests that 
integrating educative materials with face-to-face PD could be the most 
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effective approach to enhancing teachers’ understanding of the philosophy 
and key features of curriculum materials. This should be particularly true 
when the PD incorporates elements of effective PD practice. Loucks- 
Horsley, Hewson, Love, and Stiles (2003) reviewed extensive research on 
the characteristics of effective PD and identified several that are particularly 
germane to PD aimed at supporting the use of curriculum materials. These 
effective practices include providing coherent, ongoing (i.e., multi-event) 
programs that mirror best practice; keeping a focus directly on learning 
and teaching; and providing teachers opportunities to develop deep under¬ 
standing of concepts and participate in communities of reflective practice. 
Last, when teachers adopt research-based curriculum materials, it is essential 
that they learn about key features of the materials as well as the rationale for 
why developers incorporated those key features (Lin & Fishman, 2006). 
These features include the instructional model for the materials (Penuel, 
Gallagher, & Moorthy, 2011) as well as key elements such as building an 
inquiry culture, using sensemaking strategies, and understanding the story¬ 
lines for each unit of instruction. The concept of educative materials for 
teachers supported by complementary PD merges two areas of research: 
(a) the role of educative materials and (b) the numerous reports indicating 
that PD focused on the implementation of well-designed materials can 
have a significant impact on teaching and learning (Briars & Resnick, 
2001; Darling-Hammond, 1997; Heller, Daehler, Shinohara, & Kaskowitz, 
2004; Ladewski, 1994; Powell & Anderson, 2002; Schneider & Krajcik, 2002). 

The Resulting Experience for Students Using An Inquiry> 
Approach. The Treatment Condition 

Curriculum Materials 

These three major theoretical underpinnings for curriculum materials 
(constructivism, coherence, and educativeness) formed the foundation for 
the design of An Inquiry Approach. As a result, this high school program 
aims to support not only the learning of science concepts but also the devel¬ 
opment of a culture of learning that empowers students and teachers to learn 
science and conduct scientific inquiry. 

Constructivism 

One of the primary ways that the materials attend to the research on 
constructivism is by structuring learning around the BSCS 5E Instructional 
Model (Bybee, 1997; Bybee & Landes, 1990). In particular, the BSCS 5E 
(engage, explore, explain, elaborate, and evaluate) Instructional Model sup¬ 
ports the teacher in scaffolding the learning experiences for students and 
provides a research-based, social constructivist storyline throughout each 
chapter. The BSCS 5E Instructional Model organizes the instructional 
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Table 1 

How Curriculum Materials Embody the Principles of Learning 

Principles of Learning From Knowing 

What Students KnouT Related Features of the Materials 


Instruction is organized around 
meaningful problems and goals. 

Instruction must provide scaffolds for 
solving meaningful problems and 
supporting learning for 
understanding. 

Instruction must provide opportunities 
for practice with feedback, revision, 
and reflection. 

The social arrangements of instruction 
must promote collaboration and 
distributed expertise as well as 
independent learning. 

“Pellegrino, Chudowsky, and Glaser (2001). 


Activities center on relevant problems 
and/or current research with an 
inquiry focus. 

Instruction follows the 5E Instructional 
Model; scaffolding is particularly 
strong in the explore/explain cycles. 

The materials include both formative 
and summative assessments as well as 
metacognitive strategies aligned with 
the activities. 

The materials include an appropriate 
mix of small team activities, partner 
projects, jigsaws, presentations, class 
discussions, and individual work. 


sequence so that students have multiple opportunities to develop a deep 
understanding of concepts through practice, feedback, revision, and reflec¬ 
tion. See Table 1 for details. 

In the initial lesson of each chapter, there are opportunities for the stu¬ 
dents to consider, express, represent, and share their current understanding 
about a concept. This is a critical opportunity for students that sets the stage 
for and promotes learning (NRC, 2000). This in turn helps the teacher frame 
the subsequent lessons and lab activities where students have the opportunity 
to explore questions in small teams. These experiences result in a set of com¬ 
mon experiences on which teams will continue to build their understanding. 

Exploratory lab activities lead to other lessons and interactive readings 
that help students formulate and articulate their foundational understanding. 
Activities designed to reinforce and expand students’ understanding follow. 
These lab activities often ask the students to test their understanding in a dif¬ 
ferent setting or by adding a new variable. Many of the lessons also provide 
opportunities for collaborative learning that model the scientific enterprise. 
Such work in heterogeneous groups promotes the back-and-forth process 
essential to knowledge construction (Vygotsky, 1962). 

The role of formative assessment in these curriculum materials is important 
for both the students and the teachers (Atkin, 2002; Black & Wiliam, 1998; NRC, 
2001). During each lesson, the students complete tasks and respond to ques¬ 
tions that serve as benchmarks for students and teachers to assess their learning 
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experiences. In addition, students have multiple opportunities to develop 
explanations from evidence using appropriate scaffolds (McNeill & Krajcik, 
2007). At the end of a chapter, students complete a comprehensive lab activity 
or other type of lesson designed to demonstrate their understanding for them¬ 
selves and for their teacher with respect to core concepts presented in the chap¬ 
ter. This experience serves as a summative assessment for the chapter. 

The instructional model in these materials supports both the teacher and 
the student in creating a culture of inquiry in the classroom and provides 
opportunities for students to develop an understanding of science by prac¬ 
ticing science (NGSS Lead States, 2013) and reflecting on its nature, their 
experiences, and their findings. Through a coherent sequence of these lab 
activities and questions aimed at improving critical thinking skills, the stu¬ 
dents, over time, have opportunities to learn explicitly about the nature of 
science in the context of learning rigorous content. 


Coherence 

The feature of coherence in curriculum materials is critical to student 
learning (Rutherford, 2000; Schmidt et al., 1997) and foundational to the cur¬ 
riculum materials used during the intervention. Because we know that stu¬ 
dents are more likely to learn best when their learning experiences are 
grounded within a coherent conceptual framework (NRC, 2000), the develop¬ 
ment of such a framework was a critical first step in the design of the curric¬ 
ulum materials used in this study. The use of a framework facilitated the devel¬ 
opment of a focused, conceptual storyline within each chapter and across 
chapters, resulting in a coherent learning experience for students. Each of 
the core units comprises four chapters, with each unit exposing students to 
fundamental concepts in one of the science disciplines (i.e., physical science, 
life science, earth science, science and society), and this multidisciplinary 
cycle repeats in Grades 10 and 11 (see framework in Table 2). The last chapter 
in each core unit allows students to apply what they have learned thus far in 
an integrated context. The result of this articulation among units is that the 
number of major concepts students learn is fewer and the experiences with 
those concepts are more sophisticated, with the goal that students’ under¬ 
standing of them is deeper and more complex (Schmidt et al., 2001). 

To further promote coherence in An Inquiry Approach , the develop¬ 
ment team used the Understanding by Design process (Wiggins & 
McTighe, 2005). This process is divided into three main stages. In Stage I, 
Desired Results, the team identified the enduring understandings for students 
based on the national standards (the National Science Education Standards at 
that time) and scientific expertise. In Stage II, Assessment Evidence, the team 
developed the assessment tasks that would serve as evidence that the students 
had gained the targeted understandings. In Stage III, Learning Plan, the team 
developed the sequence of learning experiences that were hypothesized to 
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Table 2 

Curricular Framework for An Inquiry Approach With Next Generation Science Standards Alignment 
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Note. Codes beneath main concepts show the alignment of the program with the Disciplinary Core Ideas (DCI) in the Next Generation Science 
Standards for physical sciences (PS), life sciences (LS), earth and space sciences (ESS), and engineering, technology, and applications of science 
(ETS). 
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Chapter 2 
Earth's Heat Engine 


Evaluate 


Engage 


Hurricanes are 
important factors in 
ocean heat transport. 

v J 


Scientists analyze 
data to predict 
hurricanes and 
their impact. 





Temperature, salinity, and the 
rotation of earth affect the 
circulation of ocean water 
and heat transport. 


+ — 


J 


The atmosphere is 
important for heat 
transport around 
the Earth. 


Figure 1. Example of a conceptual flow graphic (CFG). The dark arrows represent 
connections to the central concept of the chapter. The lighter arrows represent 
connections among ideas through the sequence of activities. Dashed arrows 
indicate weaker connections. 


enable students to construct the targeted understandings and be successful on 
the assessments. Also, in the third stage, developers made extensive use of 
conceptual flow graphics (CFGs). These CFGs are visual diagrams that illus¬ 
trate the flow of ideas through the chapter and the relative strength of concep¬ 
tual connections among them (see Figure 1). Use of CFGs throughout the 
development process refined and strengthened the focus of the materials, in 
turn strengthening both rigor and coherence. 

These processes for developing the materials positioned the program 
well with the NGSS, helping to ensure coherence within a unit and across 
grade bands. For example, the Inquiry Approach program supports engag¬ 
ing students in the practices of science in many ways. The BSCS 5E 
Instructional Model that organizes and sequences instruction provides 
opportunities for students to ask questions about phenomena, design simple 
experiments, use evidence collected from their classroom experiences to 
develop explanations, and communicate scientific ideas to their peers. 
Thus, engaging students in the practices of science is integral to the structure 
of the program. Further, the concepts in Table 2 clearly align with the 
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Disciplinary Core Ideas (DCI) of the Next Generation Science Standards 
(NGSS) for high school science. In Table 2, codes beneath main concepts 
show the alignment of the program with the DCI in the NGSS for physical 
sciences (PS), life sciences (LS), earth and space sciences (ESS), and engi¬ 
neering, technology, and applications of science (ETS). Finally, Table 3 illus¬ 
trates an example of how one crosscutting concept is woven across multiple 
units and years of the program. All seven crosscutting concepts are woven 
throughout all three years of the program. 

To make the materials educative, we integrated a variety of teacher sup¬ 
ports that align with the heuristics of educative materials suggested by Davis 
and Krajcik (2005). Included in the support materials of An Inquiry 
Approach are background on the philosophy behind the instructional 
model; practical strategies for implementing the instructional model as 
intended; strategies to support meaningful, collaborative learning; and strat¬ 
egies for empowering students to monitor and support their own learning 
(e.g., through the effective use of student notebooks). The teacher materials 
also include additional content background for teachers who find them¬ 
selves teaching outside their area of expertise. This includes information 
on common conceptions students have about specific concepts and how 
to best address them (see positive relationship between teacher knowledge 
in this area and student outcomes in Sadler, Sonnert, Coyle, Cook-Smith, & 
Miller, 2013). It also includes specific ideas for both formative and summa- 
tive assessment of student learning. 

In sum, An Inquiry Approach is a comprehensive set of curriculum 
materials intended to promote the types of learning just described. By 
comprehensive, we mean that each level of the program supports teachers 
and students for a full year of high school science content without the 
need for supplementation and is designed to be used every clay for the 
entire school year. An Inquiry Approach is a three-year program, typically 
used in Grades 9 through 11. However, because the desired outcome mea¬ 
sure was administered in the spring of students’ lOth-grade year, the effi¬ 
cacy study was limited to estimating the program’s effects after just two 
years of use. The treatment condition also included a seven-day PD pro¬ 
gram for teachers, provided each year of the curriculum program to sup¬ 
port implementation. The PD program is described in the following 
section. 


The Professional Development Program 

In the context of the curriculum-based PD provided in this study, we 
translated recommended practices into a PD program that engaged teachers 
in a year-long experience with a clear focus on student learning and the 
effective implementation of the program. The providers of the PD program 
were also developers of the curriculum materials. In addition, they had 
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Table 3 

Examples of How the Inquiry Approach Program Addresses One of the Seven Next Generation 
Science Standards ( NGSS ) Crosscutting Scientific Concepts 
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extensive experience providing PD on this curriculum program. The 
research team also attended each PD session and monitored the PD imple¬ 
mentation, providing input as necessary. 

The goals of PD were to maximize implementation fidelity by helping 
teachers deepen their understanding of the nature of the materials by mod¬ 
eling lessons, encouraging collaboration around common experiences with 
the materials, improving teachers’ content knowledge, as well as enhanc¬ 
ing their ability to implement the instructional model that organizes and 
sequences all instruction in An Inquiry Approach. The seven-day PD pro¬ 
gram each year was composed of a three-day summer institute and four 
one-day sessions throughout the school year. The extended duration of 
the PD enabled us to work with the teachers throughout the year, particu¬ 
larly at the beginning of each new unit, which focused on a different sci¬ 
ence discipline. The extended duration also allowed us to introduce new 
features of the program as teachers’ understanding of the program 
expanded and their comfort level with previously introduced features 
increased. Thus, in this study, the face-to-face PD sessions complemented 
the educative aspects of the teacher support materials and aimed to pro¬ 
vide teachers with the experiences needed to take full advantage of 
research-based curriculum materials. Teachers in the PD program were 
engaged as collaborative learners of content with the PD facilitators from 
BSCS. The PD providers used (or approximated) the pedagogical methods 
suggested in the program for students. As teachers engaged in activities as 
science learners, the activities became the common experience that 
anchored subsequent conversations of pedagogy. 


The Comparison Condition 

During the two years of the study, the comparison group continued to 
use their own extant curriculum materials and received the usual PD 
planned by their schools and districts (i.e., business as usual [BaU]). In 
Grade 9, teachers in the comparison schools used one of eight different text¬ 
books provided by their school districts. Most of the eight textbooks were 
each used by only one or two comparison teachers. However, the 
Prentice Hall Physical Science and Earth Science textbooks were used by 
over half of the 28 comparison teachers (16 teachers in one of the larger par¬ 
ticipating school districts). In Grade 10, most BaU schools progressed to their 
standard lOth-grade biology curriculum and textbooks. That said, a one-unit 
sample of artifacts collected by external researchers indicated that BaU 
teachers used their textbooks only 24% of the time, indicating that they sup¬ 
plemented the district-supplied textbook with many other curriculum 
materials. 
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Table 4 

Demographics by Treatment Group 


Demographics by Group 

Treatment in = 1,509) 

Comparison in = 1,543) 

Female 

747 

722 

Special education 

166 

143 

Free/reduced lunch 

796 

592 

English language learners 

94 

70 

American Indian 

38 

18 

Asian 

82 

169 

Black 

121 

87 

Hispanic 

454 

379 

Hawaiian/Pacific Islander 

8 

15 

Mixed race/ethnicity 

19 

81 

White 

787 

794 

Number of suburban schools 

4 

5 

Number of rural schools 

5 

4 


Theory of Change 

Our theory of change is that the combination of educative curriculum mate¬ 
rials for teachers, research-based materials for students, and curriculum-based 
PD will produce a positive effect on both students and teachers and that the 
effect on students is in part mediated by positive effects on teachers’ practice. 
More specifically, research-based student materials provide scaffolding for 
exemplary teacher practice while the educative teacher materials and face-to- 
face PD provide the necessary supports for teachers to enact that curriculum 
within their own contexts. 


Study Description 

Setting and Participants 

The study reported here took place in 18 high schools (9 treatment, 9 
comparison) in the state of Washington. These 18 schools initially enrolled 
in the study for the 2009-2010 school year. Approximately half of the 18 
schools were in suburban areas. The remaining schools were in rural areas. 
Table 4 provides an overview of student and school characteristics for the 
treatment and comparison groups. Both groups were somewhat diverse in 
terms of student demographics, and each group had a similar blend of sub¬ 
urban and rural schools. Differences in student demographics across groups 
were accounted for in the analyses. 
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Study Eligibility 

All traditional high schools that had not used An Inquiry Approach in 
the past and that participated in the state achievement testing program 
were eligible for the study. Further, for a school to be eligible, the principal 
agreed to encourage teacher attendance at PD sessions and to allow access 
to classrooms for data collection. 

Research Questions 

This study was guided by a primary research question related to the effi¬ 
cacy of An Inquiry Approach plus curriculum-based PD as well as a set of 
exploratory questions related to mediation and moderation of treatment 
effects. 

Research Question 1: Primary analyses of treatment effects on student achieve¬ 
ment: Controlling for covariates, what is the main effect of treatment on student 
achievement? 

Research Question 2: Exploratory analyses of mediation: To what extent does 
teacher practice mediate the effect of treatment on student achievement? 

Research Question 3- Exploratory analyses of moderation (interactions): To what 
extent do student demographic characteristics moderate the effect of treatment 
on students (i.e., what are the interaction effects of treatment with student 
characteristics)? 


Design 

This study uses a pretest/posttest control group design (Shadish, Cook, 
& Campbell, 2002) where the means of posttreatment outcome measures are 
compared across the treatment and comparison groups after being con¬ 
trolled for pretreatment differences in outcomes. The unit of random assign¬ 
ment to groups was the school (or “cluster” of students), and as such, the 
design is also often referred to as a cluster-randomized trial (Raudenbush, 
1997). Neither matching nor blocking was used prior to random assignment 
as the late timing of schools joining the study made it impossible for reliable 
stratification levels to be established. 

Group Allocation and Attrition 

As two of the developers of An Inquiry Approach were also on the 
research team, a number of safeguards were employed to limit experimenter 
bias. The first such safeguard was using an external researcher to make ran¬ 
dom assignments to groups using a random number generator. The 
researcher did not make the assignments until schools had consented to par¬ 
ticipate in the study, and no schools left the study after learning of their 
group assignment. We used information in Table 5 to determine how 
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equivalent the groups were in science achievement (pretreatment). 
Specifically, we compared treatment and comparison group means on the 
8th-grade science baseline covariate for the baseline and analytic samples, 
respectively. The baseline sample was the set of students in schools who 
were randomly assigned to groups at the onset of the study. The analytic 
sample was the set of students in schools randomly assigned at the onset 
of the study for which a posttreatment (Grade 10) science outcome measure 
was available. By comparing the baseline and analytic sample sizes within 
Table 5, the degree of outcome measure attrition is apparent. The overall 
attrition rate of individual students in the study, based on availability of 
the lOth-grade science outcome score, is 18%, with an attrition rate of 18% 
in the treatment group and 17% in the comparison group. Thus, the differ¬ 
ential attrition rate across groups is 1%. Consulting Table 5, it is clear that 
the random assignment process was not completely successful in distributing 
baseline achievement levels evenly across groups. The baseline difference in 
the 8th-grade state science achievement scores across groups was notewor¬ 
thy (Hedges’ g = .23) and in favor of the comparison group. This baseline 
difference was accounted for in the treatment effect models described later 
in this article. Finally, we note here that the primary analysis for the main 
effect of treatment is an intent-to-treat analysis. Thus, all students retained 
their original treatment group assignment in the analytic sample (i.e., stu¬ 
dents crossing groups during the intervention were treated in the analysis 
as if they remained in their original treatment group for the full two years). 

Measures 

In this section, we describe the two measures used in the analysis. The 
first is the outcome measure used to estimate the main effect of treatment on 
student achievement. The second is a measure of classroom practice and cul¬ 
ture that was used in the mediation analysis. 

The Outcome Measure 

In this study, we sought an outcome measure with four key features: (a) 
importance to all stakeholders, including teachers, students, parents, and admin¬ 
istrators; (b) alignment with the student abilities and understandings that the 
Inquiry Approach curriculum materials seek to improve; (c) a fair outcome mea¬ 
sure for the comparison group; and (d) strong psychometric properties. For this 
study, the outcome measure chosen was the Washington state science assess¬ 
ment (the High School Proficiency Exam; HSPE), which clearly meets the first 
criterion. As for the second criterion, the HSPE has reasonable alignment with 
the Inquiry’ Approach program due to its broad coverage of science content 
(earth/space, physical, and life sciences) as well as its focus on science practices 
(e.g., developing questions and designing investigations, evidence as the basis 
for explanations and models, and communicating scientific results). We 
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Table 5 

Pretreatment Sample Sizes and Characteristics for the Baseline and Analytic Samples in a Two-Level Cluster-Randomized Trial 
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concluded that the third criterion would be met by the HSPE as this test is based 
on Washington state standards to which all schools (treatment and comparison) 
are held accountable. The psychometric properties of the HSPE are strong. For 
the lOth-grade science outcome measure, the reported Cronbach’s alpha is .87. 
For the baseline achievement covariates—the state achievement test scores for 
7th-grade writing, 8th-grade math, and 8th-grade science—the alpha values 
are .77, .90, and .89, respectively (Education Testing Service, 2012). 


The Teacher Practice Measure 

We used the Reformed Teaching Observation Protocol (RTOP; Pibum 
et al, 2000; Sawada et al., 2002) as the primary measure of teacher practice. 
From this point forward, all references to “teacher practice” should be read as 
“teacher practice as indicated by the RTOP.” The RTOP instrument measures 
the extent to which science and mathematics teaching aligns with the recom¬ 
mendations for research-based instructional reform described in national science 
and mathematics standards documents of the late 1990s. The instrument is made 
up of 25 Likert-type items, divided into five subscales: (a) Lesson Design & 
Implementation, (b) Content—Propositional Knowledge, (c) Content— 
Procedural Knowledge, (d) Classroom Culture—Communicative Interactions, 
and (e) Classroom Culture—Student-Teacher Relationships. A total score across 
all items is also calculated. Each scale varies from a score of 0, behavior never 
occurred, to a score of 4, pervasive or extremely descriptive of the lesson. As 
a whole, the protocol addresses teacher attention to students’ prior knowledge, 
student engagement in a learning community, and the extent to which teachers 
support an atmosphere of problem solving and student-generated ideas. 
Validation studies of the RTOP suggest that it can have strong psychometric 
properties. The reliability estimate (R 1 ) for the entire instrument is .954 
(Pibum et al., 2000). 

To guard against experimenter bias, two external researchers conducted the 
classroom observations. To calibrate the observers before the site visits, the two 
observers watched classroom instruction from video recordings. After each 
video, the observers then independently provided ratings across the 25 RTOP 
items. After this “pre-discussion” rating, the two observers discussed their ratings 
and the basis for their ratings. 

Classroom observations were made throughout the year. Nearly all 
teachers in this study were observed eight times (approximately once each 
month). We chose this comprehensive approach to increase the likelihood 
that the average RTOP score for each teacher was representative of his or 
her typical practice. In the analyses, the outcome measure for teachers is 
their mean RTOP score across their observations. The external observers 
of teacher practice were never told of teachers’ treatment group assignments. 
However, it likely became discernible as the students used their Inquiry 
Approach textbooks that are designed to be used most days, if not every 
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day, in class. Therefore, we acknowledge that we cannot rule out the possi¬ 
bility of observer bias in the ratings of teacher practice (RTOP scores). 

Because two external researchers each scored one half of the treatment 
and comparison classrooms, interrater reliability was calculated to test for con¬ 
sistency in scoring between raters. A sample of 7.4% of the observations was 
scored by both raters (29 out of a total of 394), and interrater reliability was 
calculated using the intraclass correlation coefficient (ICC). Analysis of the 
commonly scored observations yielded an intraclass correlation coefficient 
of .966 (total RTOP scores, two-way mixed effects model, absolute agreement, 
average measures). Interpretation of the intraclass correlation coefficient is 
similar to that of Cohen’s kappa, where a commonly used rule of thumb is 
that .40 to .59 represents moderate interrater reliability, .60 to .79 is substantial, 
and greater than .80 is outstanding (Landis & Koch, 1977). 

The same external researchers also used a Fidelity of Implementation 
Observation Protocol (BSCS, 2009) to assess treatment teachers’ use of the 
instructional materials with students. In particular, the tool examined the 
quality of teachers’ use of the BSCS 5E Instructional Model. The average 
score on this protocol across 183 independent observations of treatment 
teachers was 2.13 on a 3-point scale (71%), indicating overall program use 
consistent with the developers’ intent. 


Analysis and Findings 

Confirmatory Analyses: Main Effect of Treatment on Students 

The primary purpose of the study was to address Research Question 1: 
Controlling for covariates, what is the main effect of treatment on student 
achievement? Because assignment to treatment or comparison (BaU) conditions 
occurred at the school level while the outcome of interest occurs at the student 
level, we chose multilevel modeling to estimate the effects of the treatment on 
student achievement. Preliminary analyses confirmed this modeling choice as 
the effect of clustering is sizeable (unconditional ICC = .13), and the data 
meet the multilevel modeling assumptions of homogeneity of Level 1 variance 
(y 2 = 22.87, p = .153) and normality of residuals (Q-Q plots of residuals at 
both levels have patterns that are generally linear). 

To refine the estimate of the treatment effect, we included in the model 
a set of covariates that we hypothesized to be correlated with the outcome 
measure: students’ scores on the lOth-grade state science assessment 
(SCI10). This set of covariates included both demographic and achievement 
variables. Some student-level (Level 1) covariates were also used in aggregate 
at the school level (Level 2). The Level 1 covariates included achievement 
scores such as those from the 8th-grade state science assessment (SCI8), the 
8th-grade state math assessment (MAT8), and the 7th-grade state writing 
assessment (WRIT7). The Level 1 demographic covariates included free and 
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reduced-price lunch status (FRL) as a proxy for socioeconomic status, gender 
(GEND), English language learner status (ELL), special education status 
(SPED), grade level (GRADE), and a set of race contrast codes that include 
American Indian (AMIND), Asian (ASIAN), Black (BLK), Hispanic/Latino 
(HISP), Hawaii/Pacific Islander (HPI), and those who indicated two or more 
ethnicities (MIX), each of which allows achievement comparisons between 
the selected group of students and the reference group of White students. 

The Level 2 model included the treatment variable (TREAT) as well as 
a school mean aggregate for the eighth-grade science assessment score 
(MNSCI8), a school mean aggregate for the eighth-grade math assessment 
score (MNMAT8), and a school mean aggregate for the FRL status 
(MNFRL). Because we had large numbers of Level 1 units (students) in the 
sample but a relatively small number of Level 2 units (just 18 schools), we 
were much more judicious in including Level 2 covariates, including only 
the most theoretically influential on the outcome, as each of these consumes 
a degree of freedom in the Level 2 statistical significance tests. We grand- 
mean centered all independent variables to facilitate the desired covariate 
adjustment. The main effect model was specified and run as described 
next using STATA 12 statistical software. 

Level 1: 

SCIV)ij= ^■+ir 1 ^SCI8)y+^(MAT8) y +iT3j(WRIT7)y+^(FRL)y 
+ 1 7 Si (GEND)y+ 1 r, y (ELL)y-Hr 7i (SPED)y+ 1 r I y(GRADE)y 
+TT 9j (AMIND) i .+TTio J (ASIAN) ij .+Trii i (BLK) i .+' I r 12j (HISP) ii 

+ 'iri3jfc(HPI) ij + 'n-i4j(MIX) ij . + ey. 

Level 2: 

^oj— fioo + $o\(TREAT)j+fS 0 2(MN,SCI8) j 

+ MMNMAT8) j + $ 04 (MNFRL)j+roj. 


Descriptive statistics for the outcome variable (SCI10) are provided in 
Table 6. The main effect estimates from the multilevel model are provided 
in Table 7. 

This output suggests that the treatment group students would have 
scored an estimated 3-68 scale score points higher, on average, than students 
in the comparison group had the groups been fully equivalent prior to treat¬ 
ment. This difference ((3 0 i) is statistically significant at the a = .05 signifi¬ 
cance level (p = .035). 

Like many efficacy trials, this study was subject to attrition and a resulting 
loss of data. As a result, the research team replicated this treatment effect 
analysis after imputing the missing data using a multiple imputation algo¬ 
rithm within STATA 12. Specifically, within STATA, we used the EM 
Algorithm with multiple imputations to address missing data. The treatment 
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Table 6 

Posttreatment Outcomes for the Analytic Sample and Estimated 
Effects in a Two-Level Cluster-Randomized Trial 



Treatment Group 

Comparison Group 

Estimated Effects 

Outcome 

Measure 

Covariate- 

Adjusted 

Mean 

Unadjusted 

Standard 

Deviation 

Covariate- 

Adjusted 

Mean 

Unadjusted 

Standard 

Deviation 

Covariate- 

Adjusted 

Mean Difference p Value 

10th grade 
science 

384.52 

43.305 

380.84 

39.119 

3.68 .035 


Table 7 

Estimates of Fixed Effects on lOth-Grade Science Achievement Score 


Independent 

Variable 

Level 

Coefficient 

Standard 

Error 

z Value 

p Value 

95% Confidence 
Interval 

SCI8 

Student 

0.71 

0.03 

26.14 

<.001 

0.65 

0.76 

MAT8 

Student 

0.32 

0.02 

16.13 

<.001 

0.28 

0.35 

WRIT7 

Student 

0.80 

0.32 

2.50 

.012 

0.17 

1.43 

FRL 

Student 

-2.61 

1.01 

-2.59 

.010 

^.58 

-0.63 

GEND 

Student 

-5.41 

0.90 

-6.02 

<.001 

-7.17 

-3.65 

ELL 

Student 

-8.86 

2.20 

^.02 

<.001 

-13.17 

-4.54 

SPED 

Student 

-4.70 

1.62 

-2.90 

.004 

-7.88 

-1.52 

GRADE 

Student 

-11.96 

2.54 

-4.72 

<.001 

-16.93 

-6.99 

RACE-AMIND 

Student 

-6.92 

3.28 

-2.11 

.035 

-13.36 

-0.48 

RACE-ASIAN 

Student 

-1.70 

1.70 

-1.00 

.317 

-5.05 

1.64 

RACE-BLK 

Student 

-5.31 

1.82 

-2.92 

.004 

-8.87 

-1.74 

RACE-HISP 

Student 

-5.38 

1.29 

-4.18 

<.001 

-7.90 

-2.86 

RACE-HPI 

Student 

-10.25 

5.36 

-1.91 

.056 

-20.76 

0.26 

RACE-MIX 

Student 

-3.92 

2.47 

-1.59 

.113 

-8.75 

0.92 

TREAT 

School 

3.68 

1.75 

2.11 

.035 

0.25 

7.10 

MNSCI8 

School 

0.34 

0.34 

1.00 

.318 

-0.32 

1.00 

MNMAT8 

School 

0.08 

0.23 

0.34 

.734 

-0.38 

0.54 

MNFRL 

School 

-2.89 

7.89 

-0.37 

.715 

-18.34 

12.57 


effect estimate from the identical model applied to the imputed data sets 
yielded very similar results ((3 0 i = 3-37, SE = 1.65, p = .041), suggesting 
that the missing data did not introduce a systematic bias in the treatment 
effect estimate. 

As an additional way to interpret the treatment effect, the research team 
also computed the effect size. The Hedge’s g effect size (with small sample 
size adjustment w), a measure of practical significance, was computed by 
using the treatment effect coefficient ((3 0 i) from the multilevel model as 
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the covariate-adjusted mean difference across groups (the numerator). The 
denominator was the pooled standard deviation weighted for sample size 
differences across groups: 


9= 


"Poi 



-l)«? + (nc-l)«2 ' 

rii+n c —2 


The Hedge’s g value for the treatment effect was .09 standard deviations. The 
95% confidence interval for the effect size is [.01, .17]. The initial power anal¬ 
ysis was conducted using Optimal Design Plus Empirical Evidence (v.3) soft¬ 
ware (Raudenbush et al., 2011). In this analysis, we assumed an uncondi¬ 
tional ICC of .15, a Level 2 covariate correlation ( R 2 value) of .50, 18 
schools, and 150 students per school. These values corresponded to a mini¬ 
mum detectable effect size of d = .40. In actuality, the power was signifi¬ 
cantly better than we anticipated, allowing us to detect a much smaller effect 
than expected. First, the actual ICC was lower than our initial estimate 
(unconditional ICC = .13), and the average number of students per school 
was higher (n = 170). In addition, the analysis included covariates at both 
Level 1 and Level 2 of the model, and the Level 2 covariates provided us 
with much better precision than we anticipated G?| 2 =0.91). As a result, we 
had sufficient power in our study to detect a smaller effect size than we ini¬ 
tially expected. 

The WWC would characterize an effect size of .09 as a “statistically sig¬ 
nificant positive effect.” Although the WWC reserves the characterization of 
“substantively important” for effects larger than .25, this effect is meaningful 
in the context of the small effect sizes often observed in high school inter¬ 
ventions. For example, Hill, Bloom, Black, and Lipsey (2007) conducted 
a synthesis of effect sizes for randomized control trials and found that for ele¬ 
mentary school studies, the average effect size was .33, and for high school 
studies, the average effect size was .27. However, this high school average 
included effect sizes computed on outcome measures that were proximal 
or targeted to the intervention. Based on what Hill and colleagues observed 
in the elementary school studies, inclusion of proximal effect sizes likely 
inflated the average effect size for high school interventions. Specifically, 
they found that the average effect size varied by the breadth of focus for 
the outcome measure, reporting that “within studies of elementary schools, 
mean effect sizes are highest for specialized tests (0.44), next-highest for nar¬ 
rowly focused standardized tests (0.23), and lowest for broadly focused stan¬ 
dardized tests (0.07)” (p. 8). Given that the effect size reported for this study 
(.09) was computed using scores from a broadly focused standardized test 
(HSPE 10 Science) and is associated with a high school intervention, we 
find the effect size to be within expectation. 
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Another way to interpret this effect size is to compare it to normative 
expectations for achievement growth (i.e., average pre-post year effect sizes 
for 9th- and lOth-grade science students). This effect size expresses students’ 
expected gain in science achievement over the course of one year. Looking 
across a set of nationally normed tests, Bloom and colleagues (2008) esti¬ 
mated that the average pre-post year effect size for science in 9th grade is 
.19 and .22 for 10th grade. Thus, the two-year expected gain in achievement 
can be estimated as .41 standard deviations. The effect size of .09 detected in 
this two-year intervention study is noteworthy as it corresponds to .09/.41 or 
22% of the two-year expected gain. Multiplying .22 by 18 school months for 
a two-year intervention, we estimate that treatment group students emerge 
from the study (i.e., start 11th grade) nearly four months ahead of compar¬ 
ison group students in science achievement. 

As a final way to express the practical importance of the treatment effect, 
we converted the effect size into an improvement index using the properties 
of the normal distribution. In a normal distribution, a 1.0 SD effect size is equiv¬ 
alent to 34 percentile points. Therefore, an effect size of .09 equates to an 
improvement index of 3-06 (34 X .09) percentile points. So, if the comparison 
students were at the mean of the normed sample, the 50th percentile, the treat¬ 
ment group students would then be placed at just over the 53rd percentile. 

There are two key reasons why these estimates of the true treatment 
effect are likely conservative. First, we join other researchers who have 
noted that using new interventions that require unfamiliar practices can 
often lead to an “implementation dip,” where use of the program features 
is mechanistic and can result in a negative effect on outcomes for some 
time prior to ultimately improving outcomes (e.g., Fullan, 2001; Hall & 
Hord, 2001). An Inquiry Approach encourages teachers to use instructional 
practices that are not commonplace in high schools (Banilower et al., 2013), and 
as such, these practices were likely unfamiliar to a majority of treatment teachers. 
This implementation dip, if a factor in this study, would reduce the size of the 
treatment effect. In contrast, many comparison teachers were using programs 
or sets of activities that they used routinely prior to the research. Second, obser¬ 
vations indicate that the learning experiences of students in the comparison 
group included some research-based practices similar to those promoted in 
the treatment group. This would also tend to reduce the treatment effect. 

Other main effects in Table 7 are interesting as well. The achievement 
covariates were all highly predictive of the SCI10 outcome measure. In gen¬ 
eral, the race covariates have statistically significant (a = .05) main effects on 
achievement with the exception of the race dummy codes that compare the 
achievement of Asian, Hawaiian/Pacific Islander, and mixed ethnicity stu¬ 
dents to White students, respectively. In this sample, males had higher 
mean scores than females, economically advantaged students had higher 
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mean scores than economically disadvantaged students, native language 
English speakers had higher mean scores than English language learners, 
and students without a special education designation had higher mean 
scores than those with such a designation. In a later section, we report 
whether the relationships between key demographic variables and achieve¬ 
ment described here differ by treatment group. 


Exploratory Analyses: Mediation and Indirect Treatment Effects 

The purpose of the mediation analysis was to address Research 
Question 2: To what extent does teacher practice mediate the effect of treat¬ 
ment on student achievement? The research team hypothesized that the 
nature of teachers’ practice is critical to the efficacy of the curriculum mate¬ 
rials and that improving teacher practice is part of the mechanism by which 
the causal effect of the treatment is realized. This is supported by syntheses 
of intervention studies such as that conducted by Nye, Konstantopoulos, and 
Hedges (2004), who observed that the proportion of variance in student out¬ 
comes attributable solely to between teacher variance can be as much as 
20%. As a result, this efficacy study sought to use the RTOP to collect com¬ 
prehensive data about teacher practices to test whether the treatment (curric¬ 
ulum materials plus PD) has an indirect effect on students’ science achieve¬ 
ment via teacher practice as a mediating variable (see Figure 2). 

In Figure 2, path a represents the effect of the treatment on teacher prac¬ 
tice (RTOP); path b represents the effect of teacher practice (the mediator) 
on the science achievement outcome (SCI10), controlling for the treatment; 
and path c' represents the effect of the treatment on the science achieve¬ 
ment outcome (SCI 10), controlling for teacher practice. That is, c' is the 
direct (unmediated) effect of treatment. The product of paths a and b is often 
used to represent the mediating (indirect effect) of the treatment on the out¬ 
come (MacKinnon, 2008). 

The mediation design for this study is often referred to as a 3 —> 2 —> 1 
design because the treatment is at the third level (school), the mediator is 
measured at the second level (teacher), and the outcome is measured at 
the first level (student). We tested mediation in this 3 —» 2 —* 1 design 
using a modeling approach advocated by leading methodologists 
(MacKinnon, 2008; Pituch, Murphy, & Tate, 2010). In this approach, separate 
equations for the mediator and outcome can be used to estimate the indirect 
effect. The first set of equations in the following estimates path a. The 
teacher-level equation for the mediator is 

h'l'OI’,, -), • 
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where RTOPy is the teacher-level mediator, tt ( w is the RTOP mean for school 
j, and r 0i j is the teacher-level random effect. The school-level equation is 


ttoj - Poo + Poi TREATj + UQj , 


where (3 0 i is the effect of the treatment on the RTOP scores (path a of Figure 
2), and u 0J is the school-level random effect. The main effect of treatment on 
teacher practice estimated from these models is (3 0 i = 16.74 (SE = 3-11, p < 
.001), corresponding to raw group means and standard deviations of 71.4 
(10.1) and 55.0 (7.6) for treatment and BaU, respectively. This is a large effect 
with corresponding Hedge’s g value = 1.85. Estimating the b and c' paths 
requires a three-level model. The student-level equation for the outcome is 

SCI10 ijk -TT,),/;: T eyjfc, 


where Tr 0 f k represents the outcome mean for teacher j of school k, and e,y* is 
the student-level random effect. The teacher-level equation adds the media¬ 
tor (RTOP) as a predictor: 

it ojk =Poofc+Pot it { RT OP) jk + r 0 j k , 


where (3 0 i* represents the within-school impact of RTOP on the mean SCI10 
score. The school-level equations are 


Poo A: 7ooo +7ooi (T RE AT) k + Moo*: and Poik - 7oio> 


where y 001 is path c' of Figure 2 and y 010 is the fixed effect of RTOP on 
SCI10 (controlling for treatment), or path b. The fixed effects from this 
three-level model are c' (7001) = 1 56 (SE = 2.21, p = .49) and b (7010) = 
0.13 (SE = 0.07, p = .07). The presence of a strong treatment effect on the 
teacher-practice mediator, a nearly significant association between teacher 
practice and student achievement and a small, remaining direct treatment 
effect (c'), is consistent with our mediational hypothesis. A more formal 
test is described in the following. 

The indirect effect of teacher practice can be estimated as the product of 
the a and b paths or the ab product. This product is (701X7010) = 
(16.74X0.13) = 2.18. The 95% confidence interval for the indirect effect 
was computed using the PRODCLIN Program (MacKinnon, Fairchild, & 
Fritz, 2007), yielding [-0.12, 4.84] and a corresponding probability of type 
1 error (p = .064). Although mediation is sometimes considered to be present 
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Figure 2. Mediation of the treatment effect with unstandardized coefficients. 


only when the ah product is statistically significant at a = .05, we consider 
this result suggestive of mediation and worthy of discussion. 

Direct and indirect effects can be parsed using the following expression: 

T=T , + a&, 


where t is the total effect (3-68) from the main effect of treatment analysis, t’ 
is the direct (unmediated) effect (1.56) from the mediation analysis, and ah is 
the indirect effect of teacher practice on student achievement (2.18). 

From this decomposition of effects, we can estimate that the indirect or 
mediation effect of teacher practice is (2.18/3.68) or 59% of the total effect of 
the treatment. The magnitude of this proportion supports our hypothesis 
that teacher practice truly matters in the implementation of the curriculum 
program. Note that for this mediation test and for the moderation tests 
that follow, there was no adjustment to the significance level for multiple 
hypothesis tests as these tests are framed as exploratory. 


Exploratory Analyses: Testing Moderation Through Interaction Effects 

The purpose of the moderation analysis was to address Research 
Question 3: To what extent do student demographic characteristics moder¬ 
ate the effect of treatment on students (i.e., what are the interaction effects 
of treatment with student characteristics)? Given that nearly all the demo¬ 
graphic variables had significant main effects on achievement, it became 
important to take a comprehensive approach to testing whether these 
main effects were maintained when students were disaggregated by treat¬ 
ment group. Thus, we specified a random slopes model to test the cross¬ 
level (treatment at Level 2, demographics at Level 1) interactions between 
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treatment and all of the demographic variables of interest. The results of the 
cross-level interaction analyses were mixed (i.e., some positive, some nega¬ 
tive), but none of the effects were statistically significant at the a = .05 level. 

Discussion and Implications 

Efficacy studies such as the one described in this article are urgently 
needed (Hmelo-Silver et al., 2007). For example, there is an ongoing debate 
about whether students are better served by direct instruction or constructivist 
approaches to learning (Kirschner, Sweller, & Clark, 2006; Tobias & Duffy, 
2009). Klahr (2010) asserts “the burden of proof is on constructivists to define 
a set of instructional goals, an unambiguous description of instructional pro¬ 
cesses, a clear way to ensure implementation fidelity, and then to perform 
a rigorous assessment of effects” (p. 4). Some constructivists have expressed 
resistance to direct rigorous comparisons of these different instructional 
approaches, arguing that due to fundamental differences between constructiv¬ 
ist pedagogies and direct instruction, no common research method can eval¬ 
uate the two (Jonassen, 2009). Alternatively, Klahr states, “Constructivists can¬ 
not use complexity of treatments or assessments as an excuse to avoid 
rigorous evaluations of the effectiveness of an instructional process” (p. 3). 
Similarly, Mayer (2004) recommends that we “move educational reform efforts 
from the fuzzy and unproductive world of ideology—which sometimes hides 
under the various banners of constructivism—to the sharp and productive 
world of theory-based research on how people learn” (p. 18). 

Toward these challenges, this study adds to a small, extant set of rigor¬ 
ous studies on the effects of curriculum interventions based in research on 
constructivist learning (e.g., Clements & Sarama, 2008; Lynch, Pyke, & 
Grafton, 2012). In this study, we conclude that the combination of 
research-based curriculum materials and curriculum-based PD was effective, 
observing a positive treatment effect on students’ science achievement. This 
finding is consistent with other recent studies of interventions that combine 
curriculum materials and PD (e.g., August et al., 2014; Domitrovich et al., 
2009). The size of the effect in this study is within an expected range for 
high school interventions and for when a broadly focused outcome measure 
(state achievement test) is used. 

This efficacy study is unique in its formal test of the effect of curriculum 
materials that simultaneously embody several theoretical frameworks, 
including constructivism; educative curriculum materials; and curriculum 
coherence, focus, and rigor. Further, as a constructivist learning model 
was integral to An Inquiry Approach , findings from this study meet the 
request of scholars such as Klahr and Mayer by serving as examples of rig¬ 
orous evidence in support of constructivist approaches. In the same vein, the 
findings of this study challenge the calls for more direct instruction made by 
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Kirschner et al. (2006), Stull and Mayer (2007), and Kirschner and van 
Merrienboer (2013). 

The primary treatment effect detected in this study is necessarily limited 
in specificity as the multifaceted nature of this intervention and the nature of 
the study design prevented us from isolating the unique effects of discrete 
program features. For example, in this study it was not possible to disentan¬ 
gle the unique effects of formative assessment from the effects of the instruc¬ 
tional model or the effects of educative materials from those of face-to-face 
PD. We think that testing the efficacy of individual features of this interven¬ 
tion is useful for some features but not for others as some are too interrelated 
with other features to be adequately isolated. 

Further, the results of tests for whether the intervention led to more 
equitable outcomes or “outputs” (Lynch, 2001) for students remain inconclu¬ 
sive. Some treatment-demographic interaction effects suggested more equi¬ 
table outcomes for the treatment group while others suggested more equity 
in the BaU comparison group. None of the interaction effects were statisti¬ 
cally significant, so it is not entirely clear whether the equity-focused features 
of the materials had systematic effects on achieving equitable outcomes 
across demographic groups. 

On the other hand, more compelling results come from the exploratory 
mediation analysis where we observed a strong treatment effect on teacher 
practice (RTOP) and a positive teacher practice effect on student achieve¬ 
ment. This mediation result suggests that (a) teaching practice can be 
improved by educative, research-based instructional materials in concert 
with face-to-face, curriculum-based PD and (b) teaching practice indeed 
matters over and above the inherent features of the curriculum materials 
for students. 

Mediation results such as these could have larger implications, especially 
if the role of teacher practice continues to be observed as highly influential 
in future efficacy studies of curriculum programs. Specifically, if it is widely 
observed that the effects of research-based curriculum interventions tend to 
be nonsignificant once the influence of teacher practice is controlled, what 
does that mean for effectiveness and scale-up studies as defined by the 
Common Guidelines for Education Research and Development (IES/NSF, 
2013)? In this document, each of these types of impact studies includes 
a test of effects under routine (non-idealized) conditions. If routine condi¬ 
tions were to mean the removal of PD support and the results of this study 
prove to be consistent with that of the field at large, it seems unlikely that 
many interventions that require significant teacher expertise to implement 
will produce positive effects. We suggest then that the field consider a notion 
of curriculum-based interventions where corresponding PD is a standard 
program feature and not an upgrade that can be disregarded or deemed 
as optional to successful implementation. Acknowledging that adding PD 
support will increase the cost to school districts of implementing research- 
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based programs, we suggest from the results of this study that the additional 
cost is a worthwhile investment. 
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